Skip to content

Pace bulk refresh against the GitHub API quota#473

Open
sctice-ifixit wants to merge 4 commits into
masterfrom
backfill-quota-pacing
Open

Pace bulk refresh against the GitHub API quota#473
sctice-ifixit wants to merge 4 commits into
masterfrom
backfill-quota-pacing

Conversation

@sctice-ifixit

@sctice-ifixit sctice-ifixit commented Jun 23, 2026

Copy link
Copy Markdown
Member

Connects #467

Why

Stacked on #472 (review/merge that first). The bulk refreshes (bin/refresh-* and the startup open-pulls refresh) share one GitHub token, and one hourly quota, with the live webhook/socket refreshes that keep Pulldasher current. A backfill is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. Pulldasher already had @octokit/plugin-throttling, and #472 added @octokit/plugin-retry, but both only react at exhaustion (429/5xx), which is too late. This adds a proactive pacer that reserves quota headroom for live traffic and spreads bulk work across the rate-limit reset window.

What changed

  • New lib/pacer.js, a process-wide pacer. observe(headers) records x-ratelimit-remaining/-reset/-used from every response (live calls included, so its view stays current), fed by the after/error hooks in git-manager.js. gate() waits before a bulk request. The pure computePace even-spreads over the reset window (interval = (reset - now) / (remaining - reserve)) and hard-pauses below a configurable reserve floor (config.github.bulkReserve, default 1000, 20% of the 5000/hr token). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out (and any live spike) pushes bulk's next slot out. Pacing tracks real consumption, and bulk yields automatically.
  • Gating sits off the shared work queue: at enqueue (pushAllOnQueue) and between fetch pages (pacedPaginate in git-manager.js), never in the NotifyQueue consumer. A paged or floor-paused backfill can't block the single consumer that webhook refreshes also use, so live traffic keeps draining throughout. The single-item path (pushOnQueue, used by webhook/socket refreshes) is unchanged and never paces.
  • config.example.js documents bulkReserve.

I gated at enqueue rather than in the queue's pop handler on purpose. A gate() wait in the pop handler blocks the single shared NotifyQueue consumer, so every webhook item queued behind a paused bulk item stalls too (up to the roughly 1h floor-pause). The one queue exists (commit 041e000, 2016) to serialize refreshes and to never refresh the same issue twice concurrently (racing DB writes). A separate bulk queue would reintroduce that hazard, so enqueue-side gating keeps both guarantees and keeps the wait off the queue.

Tests

5 unit tests for the pure computePace (node --test): unknown quota allows, a stale reset window allows, at/below the reserve pauses to reset, the first gate records the usage baseline without delaying, and the cursor advances by the requests actually spent since the last gate (a 1-call gate vs an 8-call gate).

Validation (QA)

Deployed to pulldasher-dev. The startup open-pulls refresh (456 pulls) paced and spread across the reset window (pulldasher:pacer logs), peaking around a 22s inter-slot delay during the initial enqueue burst and then settling. remaining held between 4913 and 4930 throughout, never near the 1000 reserve, with 0 floor pauses. Live traffic kept flowing alongside the bulk work (581 getPull/webhook events in a 2-minute window), 0 errors, restarts=0. The prod-style token reads a 5000/hr ceiling.

The refresh-* backfills and the startup open-pulls refresh share one GitHub token — and one hourly quota — with the live webhook/socket refreshes that keep Pulldasher current. A bulk run is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. The throttle plugin only reacts once the quota is already exhausted, which is too late to protect live traffic.

Add a proactive pacer (lib/pacer.js). It observes x-ratelimit-* on every response (live calls included, so its view stays fresh) and gates only the bulk path: requests are spread across the rate-limit reset window, and bulk pauses entirely below a configurable reserve floor (config.github.bulkReserve, default 1000). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out — and any live-traffic spike — pushes bulk's next slot out; bulk yields automatically on both signals.

Gate at enqueue (pushAllOnQueue) and between fetch pages (pacedPaginate), never inside the shared queue consumer. The wait happens off the queue, so a paced or floor-paused backfill can't park the single consumer that webhook refreshes also depend on — live traffic keeps draining throughout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
djmetzle
djmetzle previously approved these changes Jun 23, 2026

@djmetzle djmetzle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. CR ⏱️

Picks up master (merged into the base branch) under the stacked pacing work.
Clean auto-merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ianrohde
ianrohde previously approved these changes Jun 23, 2026

@ianrohde ianrohde left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR ⚪

Base automatically changed from backfill-transient-error-tolerance to master June 25, 2026 16:29
@sctice-ifixit sctice-ifixit dismissed stale reviews from ianrohde and djmetzle June 25, 2026 16:29

The base branch was changed.

@sctice-ifixit

Copy link
Copy Markdown
Member Author

QA 🌷 per the description.

ianrohde
ianrohde previously approved these changes Jun 25, 2026
djmetzle
djmetzle previously approved these changes Jun 25, 2026

@djmetzle djmetzle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 🏃

Comment thread lib/refresh.js Outdated
@danielbeardsley

Copy link
Copy Markdown
Member

I had a feeling that I remember doing this before.. or something very similar (with a quota buffer). Hmm... Looks like we deleted it when we upgraded the octokit api: #211

https://github.com/iFixit/pulldasher/blob/811a7a222b38c43e2a85009f4c282141aba4713e/lib/rate-limit.js

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

@sctice-ifixit

Copy link
Copy Markdown
Member Author

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

I believe this pull's approach should work across instances using the headers we get back from GH as the coordination mechanism.

Comment thread lib/refresh.js Outdated
@danielbeardsley

Copy link
Copy Markdown
Member

deploy_block 👍 on the discussion above.

The first pass gated at enqueue (in `pushAllOnQueue`) and ran the server's
startup `openPulls()` refresh through that same paced path. Both share the one
queue consumer, so a floor-pause (waiting out the reserve once quota drops below
the floor) could park the consumer that live webhook and socket refreshes also
drain through, taking pulldasher down for the duration.

Move the gate onto the consumer drain instead, where each pull's `parse()` fans
out to the dozen-plus API calls that are the bulk of a backfill's spend. Only the
CLI backfill bins install a real pacer (`createPacedRefresh`); the server,
webhooks, and socket refreshes run an unpaced no-op pacer, so the gate never
fronts live traffic. In a bin process the consumer drains nothing but bulk, so
parking it on a paced slot starves nothing.

Drop the mutable module-global pacer from `git-manager` in favor of explicit
injection. `observeRateLimit(pacer)` installs a one-time quota observer on the
shared Octokit client so the deep `parse()` fan-out (which the caller can't see)
still feeds the pacer; the list functions take an optional pacer to pace
pagination. `pacedPaginate` gates only between pages, never after the last.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sctice-ifixit sctice-ifixit dismissed stale reviews from djmetzle and ianrohde via 67ca46f June 30, 2026 23:04
@sctice-ifixit

Copy link
Copy Markdown
Member Author

un_deploy_block 🌷

The split (push the items, then `return new Promise`) was load-bearing only
while enqueueing was async: the old loop awaited `pacer.gate()` between pushes,
which can't run inside a synchronous Promise executor. With pacing moved to the
consumer the loop is a plain `forEach`, so the pushes fold into the executor.
It runs synchronously, so the queue ordering (items, then the resolve sentinel)
is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@ianrohde ianrohde left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers.

Optional follow-up: lib/git-manager.js:22 console.log(config.github) dumps the oauth token to stdout. Not introduced here, but adjacent to the diff.

@danielbeardsley

Copy link
Copy Markdown
Member

I read deeply through this and I approve!

However, unless I'm missing something, we'd still want to use a larger bulkReserve for cli-launched scripts vs the config used for the server.

Right now, I think it's possible that a cli script can starve out the front-end (Two separate pacers). If they have different bulk-reserves then the the cli refresh will only proceed while the frontend is idle and the token bucket builds back up.

@sctice-ifixit

Copy link
Copy Markdown
Member Author

Right now, I think it's possible that a cli script can starve out the front-end (Two separate pacers).

Right now, the CLI is the only path that is paced, and it will never (modulo bugs) permit using any of the last 1000 quota available per hour. So once a CLI script sees there are less than 1000 requests remaining, it will pause until the quota is refilled (GH tells us when that will be in the headers). So the front-end should always have at least 1000 requests to work with in any given hour (in practice more), and it always goes as fast as it can. If we think 1000 isn't conservative enough, we can up it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants